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Abstract 

Asymmetric systematic errors arise when there is a non-linear dependence of a result 
on a nuisance parameter. Their combination is traditionally done by adding positive and 
negative deviations separately in quadrature. There is no sound justification for this, and 
it is shown that indeed it is sometimes clearly inappropriate. Consistent techniques are 
given for this combination of errors, and also for evaluating % 2 , and for forming weighted 
sums. 



1. Introduction 

Although most errors on physics results are Gaussian, there are occasions where the 
Gaussian form no longer holds, and indeed when the distribution is not even symmetric. 

This can occur for statistical errors, when the one-cr interval is read off a log likelihood 
curve which is not well described by a parabola [1] . It can also arise in evaluating systematic 
errors: if a 'nuisance parameter' a which affects the result x has an uncertainty described 
by a Gaussian distribution with mean \x a and standard deviation a a , then the uncertainty 
in a produces an uncertainty in x given to first order by the standard combination of errors 
formula: 



The uncertainty in a may be frequentist (for example, a Monte Carlo parameter deter- 
mined by another experiment) or Bayesian (for example, a Monte Carlo parameter set 
by judgement of theorists.) Bayesian probabilities may be admissable even in basically 
frequentist analyses if the effects are small [2]. The assumption that a has a Gaussian 
probability distribution may be questioned, but that brings in further complications we do 
not wish to consider here. 

If the differential is not known analytically a numerical evaluation can be done, most 
conveniently by evaluation of x(fi a + a a ) and x(fi a — a a ). See [3] for a discussion of the 
procedure and some issues that may arise. 

Both x(fi a + a a) — x(/ji a ) and x(fi a ) — x(fj, a — a a ) give estimates of the uncertainty a x . 
If they are different then this is a sign that the dependence is non-linear and the symmetric 
distribution in a gives an asymmetric distribution in x. 

The questions that can be asked are: 

• How should asymmetric errors be combined? 

• How should a x 2 be formed? 

• How should a weighted mean be formed from results with asymmetric errors? 
Current practice is to combine such errors separately, i.e. to add the a + values together 

in quadrature, and then do the same thing for the o~ values. This is not, to my knowledge, 
documented anywhere and, as will be shown, is certainly wrong. 

2. Models 

The analysis gives 3 co-ordinate pairs: (a — a a , x — a~),(a, x) and (a + a a , x + cr+). 
In practice there are errors on these points, and one might be well advised to assume a 
straight line dependence and take the error as symmetric, however we will assume that 
this is not a case where this is appropriate. Again, faced with a real non-linear dependence 
one might well be advised to map out more than three points; we will likewise assume that 
this is not done. We consider cases where a non-linear effect is not small enough to be 
ignored entirely, but not large enough to justify a long and intensive investigation. Such 
cases are common in practice. 

For simplicity we transform a to the variable u described by a unit Gaussian, and 
work with X(u) = x(u) — x(0). For future convenience it is useful to define the mean a, 
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the difference a, and the asymmetry A: 



a + + a a + — a a + — a 

a = 9 a= 9 A= rr+ X n~ (1) 

z z <j' + a 

There are infinitely many non-linear relationships between a and X that will go 
through these three points. We consider two. 
Model 1 : Two straight lines 

Two straight lines are drawn, meeting at the central value 

X = a + u u>0 

~ ■ (2) 
= a u u < 



Model 2 : A quadratic function 

The parabola through the three points is 

X = au + au 2 = au + Aau 2 . (3) 

These forms are shown in Figure 1 for a small asymmetry of 0.1, and a larger asym- 
metry of 0.4. 
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Figure 1: X (vertically) against u (horizontally) 
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Model 1 (two straight lines) is shown in red, and Model 2 (a; as a quadratic function of 
u) in green. Both go through the 3 specified points. The differences between them within 
the range — 1 < u < 1 are not large; outside that range they diverge considerably. 

We have no knowledge of whether either of them is better than the other in a particular 
case. Model 1 has kink at u = which is unphysical. Model 2 has a turning point, which 
may well be unrealistic (though it only gets into the relevant region if A is fairly large.) 
The practitioner may select one of the two - or some other model - on the basis of their 
knowledge of the problem, or preference and experience. Working with asymmetric errors 
at all involves the assumption of some model for the non-linearity. The 'correctness' of 
any model may be arguable, but once chosen it must be used consistently. 

The distribution in u is a unit Gaussian, G{u), and the distribution in X is obtained 
from P{X) = \dxjd u \ ■ For Model 1 this gives a dimidated Gaussian - two Gaussians with 
different standard deviation for X > and X < f . For model 2 with small asymmetries 
the curve is a distorted Gaussian, given by , , with u = ^ j2 + 4aX - z£ i For larger 

' ° J \a+2ceu\ la ° 

asymmetries and/or larger \X\ values, the second root also has to be considered. Examples 
are shown in Figure 2. 
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Figure 2: Examples of the distributions from combined asymmetric errors. 



f This is sometimes called a 'bifurcated Gaussian', but this is inaccurate. 'Bifurcated' 
means 'split' in the sense of forked. 'Dimidated' means 'cut in half, with the subsidiary 
meaning of 'having one part much smaller than the other' [4]. 
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It can be seen that the Model 1 dimidated Gaussian and Model 2 distorted Gaussian 
are not dissimilar if the asymmetry is small, but are very different if the asymmetry is 
large. Again, in a particular case there is no unique reason for choosing one above the 
other in the absence of further information. 

3. Bias 

If a nuisance parameter u is distributed with a Gaussian probability distribution, and 
the quantity X(u) is a nonlinear function of u, then the expectation (X) is not X((u)). 
For model 1 one has 



o ( „■'■■> 

2tt Jq V2n V2n 



e ' — a~ 
< X >= I a~u ^ -du + / a + u — du = (4) 



For model 2 one has 



x ..2, »-/•_' _+ 



< X >= I a -= — du = = a (b) 

V27T 2 



Hence in these models, or others, if the result quoted is A(0), it is not the mean. It 
is perhaps defensible as a number to quote as the result as it is still the median - there is 
a 50% chance that the true value is below it and a 50% chance that it is above. 

4. Adding Errors 

If a derived quantity z contains parts from two quantities x and y, so that z = x + y, 
the distribution in z is given by the convolution: 

fz(z) = J dxf x (x)f y (z - x) (6) 
With Model 1 the function for z > can be written: 



/0 pz poo 

dxf x -(x)f y+ (z - x) + / dxf x+ (x)f y+ (z - x) + dxf x+ (x)f y -(z - x) 
-oo Jo J z 

Inserting the appropriate Gaussian functions and using 

2 -1-2 -1-2 2 -1-2 —2 9 —2 4-2 o —2 —2 

v + = a J + o±=oZ + a y o^=a x + o_=a x + a y , 

this gives 

2 2 2 

where g(x) is the cumulative Gaussian, equivalent to |(1 + erf(x)), and g(x) = 1 — g(x). 
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Figure 3: Examples of the distributions from combined asymmetric errors. 

Figure 3 shows the distributions from some typical cases. The blue line shows the con- 
volution, the black line is obtained by adding the positive and negative standard deviations 
separately in quadrature (the 'usual procedure'). 

The agreement is not good. It is apparent that the skew of the distribution obtained 
from the convolution is smaller than that obtained from the usual procedure. This is 
obvious: if two distributions with the same asymmetry are added then the 'usual procedure' 
will give a distribution with the same asymmetry. This violates the Central Limit Theorem, 
which says that convoluting identical distributions must result in a combined distribution 
which is more Gaussian, and therefore more symmetric, than its components. This shows 
that the 'usual procedure' for adding asymmetric errors is inconsistent. Even though, as 
stated earlier, there is no guarantee that Model 1 or any model is correct, once a model 
has been adopted it should be handled in a consistent fashion, and the 'usual procedure' 
fails to do this. 
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5. A consistent addition technique 

If a distribution for x is described by some 3 parameter function, f(x; xo, a + , cr~), 
which is a Gaussian transformed according to Model 1 or Model 2 or anything else, then 
'combination of errors' involves a convolution of two such functions according to Equation 

6. This combined function is not necessarily a function of the same form. It is a special 
property of the Gaussian that the convolution of two Gaussians gives a third. Figure 3 is 
a demonstration of this. The convolution of two dimidated Gaussians is not a dimidated 
Gaussian. 

Although the form of the function is changed by a convolution, some things are pre- 
served. The semi-invariant cumulants of Thiele (the coefficients of the power series expan- 
sion of the log of the Fourier Transform) add under convolution. The first two of these are 
the usual mean and variance. The third is the unnormalised skew: 

7 =< x s > -3 < x X x 2 > +2 < x > 3 (7) 

Within the context of any model, a rational approach to the combination of errors is 
to find the mean, variance and skew: \i, V and 7, for each contributing function separately. 
Adding these up gives the mean variance and skew of the combined function. Working 
within the model one then determines the values of cr_, cr+, and Xq that give this mean, 
variance and skew. 

5.1 Model 1 

For Model 1, for which (x 3 ) = ^=(a3_ — cr 3 ) we have 

V 27T 

1 



\1 = X H -!=(cr + -(T ) 

'2n 



(8) 
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7 = 



2(a+ 3 - a~ 3 ) - l(a+ - a~)(a+ 2 + a" 2 ) + -(a+- a") 3 

2 7T 



So given a set of error contributions then the equations (8) give the cumulants fi, V and 
7. The first three cumulants of the combined distribution are given by adding up the 
individual contributions. Then one can find the set of parameters a~,a + ,xo which give 
these values by using Equations (8) in the other sense. 

It is convenient to work with A, where A is the difference between the final xq and 
the sum of the individual ones. The parameter is needed because of the bias mentioned 
earlier. Even though each contribution may have xq = 0, i.e. it describes a spread about 
the quoted result, it has non-zero \ii through the bias effect (c.f. Equation 4). The cr+ and 
o~~ of the combined distribution, obtained from the total V and 7, will in general not give 
the right fi unless a location shift A is added. The value of the quoted result will shift. 

Recalling section 3, for a dimidated Gaussian one could defend quoting the central 
value as it was the median, even though it was not the mean. The convoluted distribution 
not only has a non-zero mean, it also (as can be seen in Figure 2) has non-zero median. 
Consider two dimidated Gaussians with, say, a + > o~ , which are convoluted. There is 
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a 25% chance that both will contribute a negative value, a similar 25% chance that both 
will be positive, and a 50% chance of getting one positive and one negative contribution - 
which will probably be positive overall (as a + > a~). So for two combined distributions 
the zero value may lie as far away as the 25th percentile. 

If you want to combine asymmetric errors then you have to accept that the quoted 
value will shift. To make this correction requires a real belief in the asymmetry of the error 
values. At this point the practitioner, unless they are really sure that their errors really 
do have a significant asymmetry, may be persuaded to revert to quoting symmetric errors. 

Solving the Equations (8) for cr~ , a" 1 ", xq given /a, V and 7 has to be done numerically. 
If we write D = a + — a~ and S = a~ 2 + a +2 then the equations 

S = 2V + D 2 /n 

(9) 



D 



2 / 1— %A 



1 



can be solved by repeated substitution (starting with D 

D 

A = \i- 



0). Then A is given by 



(10) 



A program for this is available on |http : //www . slac . Stanford . edu/ ~barlow| . Some re- 
sults are shown in Figure 4 and Table 1. 
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Figure 4: Examples of combined errors with the correct first 3 cumulants using Model 1. 
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1.52 
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1.61 


0.16 
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1.78 


0.28 


0.5 
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0.5 


1.5 


0.97 


1.93 


0.41 



Table 1: The values used in Figure 4 

Comparing Figure 4 and Figure 3 (note that the blue curves are the same in both 
figures; the consistent technique is shown in purple), it is apparent that the new technique 
does a very much better job than the old. It is not an exact match, but does an acceptable 
job given that there are only 3 adjustable parameters in the function. 

5.2 Model 2 

In terms of the difference a = (cr + — cr~)/2 and the mean a = (a + + <J~)/2 the 
moments are 

< x >= a < x 2 >= a 2 + 3a 2 < x 3 >= 9aa 2 + 15a 3 

Giving 

H = xq + a 

V = a 2 + 2a 2 (11) 
7 = Q a 2 a + 8a 3 

As with Method 1, these are used to find the cumulants of each contributing distribution, 
which are summed to give the three totals, and then Equation 11 is used again to find the 
parameters of the distorted Gaussian with this mean, variance and skew. There is only 
one equation to be solved numerically, again by iteration 

a= ~ (12) 

6V -4a 2 K ' 

after which, a = \/V — 2a 2 and A = ji — a. 

Some results are shown in Figure 5 and Table 2. The true convolution cannot be done 
analytically but can be done by a Monte Carlo calculation. 
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Table 2: The values used for the curves with correct cumulants in Figure 5. 
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Figure 5: Examples of combined errors using Model 2. 

Again the true curves (blue) are not well reproduced by the 'usual procedure' (black) 
whereas the curves with the correct cumulants (purple) do a very reasonable job. (The 
sharp behaviour at the lower edge of the curves is due to the minimum value of y.) 

The web program mentioned earlier will also do the calculations for Model 2. 



6. Evaluating % 2 

For Model 1 the x 2 contribution from a discrepancy 5 is just 5 2 /a + or 5 2 /a~ as 
appropriate. This is manifestly inelegant, especially for minimisation procedures as the 
value goes through zero. 

For Model 2 one has 

5 = au + Aau 2 



This can be considered as a quadratic for u with solution 



1 + 4±A- 1 

V a 

U = 2A 

Squaring gives it 2 , the x 2 contribution, as 

4A 2 
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This is not really exact, in that it only takes one branch of the solution, the one approx- 
imating to the straight line, and does not consider the extra possibility that the S value 
could come from an improbable u value the other side of the turning point of the parabola. 
Given this imperfection it makes sense to expand the square root as a Taylor series, which, 
neglecting correction terms above the second power, leads to 

X 2 = (-) 2 (l-2A(V 5 A 2 (V). (13). 
a \ a a J 

The first order approximation to this is 

X 2 = (-) 2 (1-2A(-)). (14) 
a a 

This can be modified to a form forced to give x 2 = 1 for deviations of +a + and —a~. 
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Figure 6: x 2 approximations 



Figure 6 shows these forms. The black line is the simplest x 2 = (-) 2 form. The 
green is the full form involving the square root. It goes to +oo for values beyond the 
turning point which in principle can never happen. The blue line is the third order form 
of Equation 14 and the red line is the higher order Equation 13. The yellow is Equation 
15, the first order form constrained to go though unity at +a + and — cr~, shown by the 
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two crosses. For a 10% asymmetry all the approximations are pretty well equivalent and 
a significantly better form than the simplest one. For a larger 20% asymmetry the lower 
order forms show undesirable behaviour, turning over for a moderate (2a) deviation. 

We therefore suggest that Equation 13 be used. The even power ensures that \ 2 
does not turn over but increases at large deviations, which is desirable. It does not go to 
infinity when 5 approaches the turning point, which is probably a good feature. A poor 
determination of the parameters of Equation 3 could give an unrealistic minimum value 
which could be exceeded by an experimental value, and one would not want this to give 
an undefined x 2 . 

Higher order (5th, 6th...) terms do not significantly improve the agreement with the 
full (green) curve. 



7. Weighted means 

Suppose a value x has been measured several times, x±, X2---Xn, each measurement 
having its own a^~ and a~ . For the usual symmetric errors the 'best' estimate (i.e. unbi- 
assed and with smallest variance) is given by the weighted sum 

WiXi 

x = 

with Wi = 1/af. We wish to find the equivalent for asymmetric errors. 

As noted in Section 3, when sampling from an asymmetric distribution the result is 
biassed towards the high tail. The expectation value (x) is not the location parameter x. 
So for an an unbiassed estimator one has to take 

x = ^ Wj(xj - bj)/ ^ Wi (16) 

where 

b = a (Model 1) b = a (Model 2) (17) 



/2ir 

The variance of this is given by 

where Vi is the variance of the i th measurement about its mean. 
Differentiating with respect to Wi to find the minimum gives 

2w l V l 2J2w]V J 



(E«^) 2 (E^i) 3 

which is satisfied by Wi = 1/Vi. This is the equivalent of the familiar weighting by I /a 2 . 
The weights are given by (see Equations 8 and 11) 

V = a 2 + (1 - -)a 2 (Model 1) V = a 2 + 2o? (Model 2) (18) 

TV 

Note that this is not the ML estimator - writing down the likelihood in terms of the 
X 2 of Equation 13 and differentiating does not give to a nice form - so in principle there 
may be better estimators, but they will not have the simple form of a weighted sum. 



= Vi 
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8. Asymmetric statistical errors 

When the estimated value and range are obtained using a maximum likelihood esti- 
mate and the shape of the log likelihood is not parabolic, the one standard deviation limits 
are taken as the points at which the log likelihood falls by 0.5 from its peak [1]. 

The treatment of these errors will be given in a subsequent publication. Although 
treatment of asymmetric errors involves, for both systematic and statistical errors, the 
mapping of the actual distribution onto a Gaussian one, there is a considerable difference 
of interpretation. It is, however, worth pointing out that if two separate statistical effects 
are combined - say two backgrounds from different sources - then the combined background 
is the simple arithmetic sum of the two with no shift to the central value. This is because, 
for these statistical errors, the value quoted is the mean. 



9. Summary 

The treatment of asymmetric systematic errors cannot be based on secure foundations, 
and if they cannot be avoided they need careful handling. The practitioner needs to choose 
a model for the dependence, which could be one of the two proposed here. 

In combining asymmetric errors, the traditional procedure of adding positive and neg- 
ative values separately in quadrature is unjustifiable. Instead, values should be determined 
which, within the limitations of the model, give the correct mean, variance, and skew. A 
program is available to do this on |http : //www . slac . Stanford . edu/^barlow| . 



The x contribution for a value with asymmetric errors can be represented by 

X 2 = (-) 2 (l-2A( 5 -) + 5A^r 
a \ a a 

where 

a + + a~ , a + — a~ 



a 



A 



2 ' a+ + a~ 

In forming a weighted sum one should use 

where the bias b and Variance V are given by Equations 17 and 18 above. 
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